Reveal the social areas of the city of Engagement, Ohio USA.
This take-home exercise aims to reveal reveal social areas and visualizing and analyzing locations with traffic bottleneck of the city of Engagement, Ohio USA by using visualization techniques in R.
Assuming the volunteers are representative of the city’s population, characterize the distinct areas of the city that you identify. For each area you identify, provide your rationale and supporting data.
Where are the busiest areas in Engagement? Are there traffic bottlenecks that should be addressed? Explain your rationale.
Links to the dataset:
TravelJournal.csv
Employers.csv
Pubs.csv
Restaurants.csv
Buildings.csv
Schools.csv
Apartments.csv
Packages, namely `tidyverse, sftime, ViSiElse, tmap are required for this exercise. This code chunk installs the required packages and loads them onto RStudio environment.
packages = c('tidyverse','ggplot2','ggdist', 'ggridges','patchwork', 'ggthemes','hrbrthemes','ggrepel','ggforce',"HH","vcd",'scales','grid','gridExtra','formattable','readr', 'ggiraph', 'plotly', 'DT', 'gganimate','readxl','gifski','gapminder','treemap','treemapify','rPackedBar','ggstatsplot','ggside','broom','crosstalk','ViSiElse','zoo', 'lubridate', 'remotes', 'trelliscopejs','data.table','sf','tmap','sf','clock','sftime','rmarkdown')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The dataset used in this exercise is Participants.csv, published by the IEEE for [VAST challenge 2022] (https://vast-challenge.github.io/2022/)
The code chunk below imports Buildings.csv ,
TravelJournal.csv, ParticipantStatusLogs10.csv,
Schools.csv, Employers.csv, Pubs.csv,
Restaurants.csv and Apartments.csv from the data
folder into R by using read_csv()
function of readr and saves it as Tibble data frame
called buildings, travel, logs,
schools, employers, pubs,
Restaurants and apartments
The TravelJournal.csv contains information about participants’ motivation for movement around the city.
travel <- read_csv("data/TravelJournal.csv")
summary(travel)
schools <- read_sf("data/Schools.csv",
options = "GEOM_POSSIBLE_NAMES=location")
pubs <- read_sf("data/Pubs.csv",
options = "GEOM_POSSIBLE_NAMES=location")
restaurants <- read_sf("data/Restaurants.csv",
options = "GEOM_POSSIBLE_NAMES=location")
buildings <- read_sf("data/Buildings.csv",
options = "GEOM_POSSIBLE_NAMES=location")
apartments <- read_sf("data/Apartments.csv",
options = "GEOM_POSSIBLE_NAMES=location")
employers <- read_sf("data/Employers.csv",
options = "GEOM_POSSIBLE_NAMES=location")
logs <- read_sf("data/ParticipantStatusLogs10.csv",
options = "GEOM_POSSIBLE_NAMES=currentLocation")
glimpse(logs)
The Travel Journal contains travel data of a participant towards Work/Home Commute, Eating, Coming Back From Restaurant,Recreation (Social Gathering), Going Back to Home.
The other schools, buildings, restaurants, pubs, apartments and employers gives us location of distinct places in the city.
Participant logs gives us the routine log file of a participant over a certain time period.
Calculating the total amount spent at the location as a difference of the starting balance and ending balance in the travel journal gives us the amount spent by the participant at a particular location.
travel$amountSpent <- travel$endingBalance -travel$startingBalance
We use weekdays(), day(), month(), year() functions to extract the day of the week, date, moth and year of checkin to perform time series visualizations.
Calculate travel time as the difference between the travel start time and the travel end time and calculate the time spent as the difference of check in and check out times.
data_travel= travel%>%
mutate(weekday = weekdays(checkInTime),
day = day(checkInTime),
month=as.character(checkInTime,"%b %y"),
year = year(checkInTime),
monthYear = floor_date(checkInTime, "month"),
travelEndLocationId=as.character(travelEndLocationId),
timeSpent = checkOutTime - checkInTime,
travelTime = travelEndTime- travelStartTime,
participantId=as.character(participantId),
purpose=as.character(purpose))
data_travel$timeSpent <- as.numeric(as.character(data_travel$timeSpent))
data_travel$travelTime <- as.numeric(as.character(data_travel$travelTime))
data_travel <- data_travel[,c("participantId","travelStartLocationId", "travelEndLocationId", "purpose", "checkInTime", "amountSpent","timeSpent","travelTime","weekday","day","month","year","monthYear")]
# A tibble: 6 × 13
participantId travelStartLocationId travelEndLocationId purpose
<chr> <dbl> <chr> <chr>
1 23 532 894 Recreation …
2 876 NA 1804 Eating
3 902 NA 1801 Eating
4 919 NA 1802 Eating
5 154 NA 446 Eating
6 509 NA 1801 Eating
# … with 9 more variables: checkInTime <dttm>, amountSpent <dbl>,
# timeSpent <dbl>, travelTime <dbl>, weekday <chr>, day <int>,
# month <chr>, year <int>, monthYear <dttm>
Simple feature collection with 6 features and 11 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -3624.161 ymin: 2870.49 xmax: -1791.201 ymax: 5848.29
CRS: NA
# A tibble: 6 × 12
timestamp currentLocation participantId currentMode
<chr> <POINT> <chr> <chr>
1 2022-04-26T00:2… (-2459.039 2982.023) 314 AtHome
2 2022-04-26T00:2… (-3624.161 5661.512) 315 AtHome
3 2022-04-26T00:2… (-1791.201 5848.29) 316 AtHome
4 2022-04-26T00:2… (-2384.221 2870.49) 317 AtHome
5 2022-04-26T00:2… (-2459.754 3053.668) 318 AtHome
6 2022-04-26T00:2… (-3339.739 5572.409) 319 AtHome
# … with 8 more variables: hungerStatus <chr>, sleepStatus <chr>,
# apartmentId <chr>, availableBalance <chr>, jobId <chr>,
# financialStatus <chr>, dailyFoodBudget <chr>,
# weeklyExtraBudget <chr>
logs_path <- logs %>%
mutate(Timestamp = date_time_parse(timestamp,
zone= "",
format = "%Y-%m-%dT%H:%M:%S"))%>%
mutate(day=get_day(Timestamp))%>%
filter(currentMode == "Transport")
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
size = 1,
border.col = "black",
border.lwd = 1)
tmap_mode("plot")
We notice that employers are distributed throughout the city with a maximum number in the central part of the city which appears more connected.
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
size = 1,
border.col = "black",
border.lwd = 1) +
tm_shape(employers) +
tm_dots(col = "blue")
Since we have only one school present in the city of engagement Ohio, USA. we notice that two schools are present in the north east block of the city, one in th central east and one in the southern block.
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
size = 1,
border.col = "black",
border.lwd = 1) +
tm_shape(schools) +
tm_dots(col = "yellow")
It is found that the restaurants are more distributed towards the northen and central blovks of the city as compared to the southern block. The southern block only hosts three restaurants.
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
size = 1,
border.col = "black",
border.lwd = 1) +
tm_shape(restaurants) +
tm_dots(col = "red")
Housing locations/ apartments are located in clusters in certain blocks of the city.
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
size = 0.5,
border.col = "black",
border.lwd = 1) +
tm_shape(apartments) +
tm_dots(col = "orange")
hex_buildings <- st_make_grid(buildings,
cellsize=100,
square=FALSE) %>%
st_sf() %>%
rowid_to_column('hex_id_buil')
points_in_hex_buil <- st_join(logs_path,
hex_buildings,
join=st_within) %>%
st_set_geometry(NULL) %>%
count(name='pointCount', hex_id_buil)
head(points_in_hex_buil)
# A tibble: 6 × 2
hex_id_buil pointCount
<int> <int>
1 169 16
2 212 49
3 225 12
4 226 93
5 227 14
6 228 36
hex_employers <- st_make_grid(employers,
cellsize=100,
square=FALSE) %>%
st_sf() %>%
rowid_to_column('hex_id_emp')
points_in_hex_emp <- st_join(logs_path,
hex_employers,
join=st_within) %>%
st_set_geometry(NULL) %>%
count(name='pointCount', hex_id_emp)
head(points_in_hex_emp)
# A tibble: 6 × 2
hex_id_emp pointCount
<int> <int>
1 25 19
2 42 18
3 68 24
4 69 33
5 70 12
6 71 16